Quality of Life and the Walkability of CountiesΒΆ
This project aims to examine the relationship between the walkability of a county and the quality of life that its residents enjoy. It will look at various metrics such as health, economic conditions, and various factors of quality of life in order to determine whether the walkability of counties has an impact on quality of life or not.
Research Questions:
- Can a connection be drawn between the walkability of counties and the quality of life of people living in those counties?
Strong connections were not able to be drawn between the walkability of counties and the quality of life for people living in those counties. There was no clear relationship between air quality and how walkable that county is. There was, however, a somewhat small relationship that could be drawn between the number of water quality violations per visitation that a county faced and its walkability where counties with higher walkability index values tended to have fewer water quality violations.
- Are there connections between the walkability of county and health and how important are these differences?
Clear relationships could be found between the walkability of a county and the physical health of its residents. Many of the counties that had lower walkability index scores tended to have higher rates of obesity and diabetes.
- How does the walkability of counties affect how well people are doing economically?
Walkability seems to be correlated with how well a county is doing economically. Higher walkability index scores tend to be correlated with lower unemployment rates and higher median incomes, but also higher costs of living.
Challenge GoalsΒΆ
The first challenge goal will be to use multiple datasets. This project will be centered around an EPA dataset that provides a walkability index score for US census block groups. In addition, two datasets that cover rates of diabetes, obesity, quality of life, and economic conditions.
The second challenge goal is to use a new library by using plotly in order to visualize the datasets.
Collaboration and ConductΒΆ
Students are expected to follow Washington state law on the Student Conduct Code for the University of Washington. In this course, students must:
- Indicate on your submission any assistance received, including materials distributed in this course.
- Not receive, generate, or otherwise acquire any substantial portion or walkthrough to an assessment.
- Not aid, assist, attempt, or tolerate prohibited academic conduct in others.
Update the following code cell to include your name and list your sources. If you used any kind of computer technology to help prepare your assessment submission, include the queries and/or prompts. Submitted work that is not consistent with sources may be subject to the student conduct process.
your_name = "David Oh"
sources = [
## Dataset and Sources related to Datasets
# Dataset 1
"https://catalog.data.gov/dataset/walkability-index8 - dataset source",
"""https://geodata.epa.gov/arcgis/rest/services/OA/WalkabilityIndex/MapServer/layers - website
with the walkability data source. Provides key information on how to use the data source.""",
"""https://www2.census.gov/geo/pdfs/reference/GARM/Ch11GARM.pdf - source used to understand the
data source. Queried - what is a census block group.""",
"""https://github.com/kjhealy/us-county/blob/master/data/census/fips-by-state.csv - used to
convert the census block groups of the dataset into county-level information. Queried -
Fips county data to county name dataset""",
# Dataset 2
"""https://www.kaggle.com/datasets/zacvaughan/cityzipcountyfips-quality-of-life
- dataset source""",
# Dataset 3
"""https://www.kaggle.com/datasets/sirishasingla1906/diabetes-prevalence-data
- dataset source""",
## Lessons
"Apr 14 Data Frames lesson",
"Apr 16 Groupby and Indexing lesson",
"Apr 21 Data Visualization lesson",
"May 14 Dissolve, Intersect, and Join lesson",
"Education Assessment",
## Debugging or figuring out how to do something
"""https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html - used to
concatenate the State Fips code and County Fips code to allow for datasets to be merged.
Queried - dataframe of int to dataframe of string""",
"""https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html - used to
figure out how to use groupby without setting the string to groupby to be the index.
Queried - groupby pandas""",
"""https://stackoverflow.com/questions/22216076/unicodedecodeerror-utf8-codec-cant-decode
-byte-0xa5-in-position-0-invalid-s - used to fix the a error I received when trying to
read in the fips-by-state.csv file. Queired - UnicodeDecodeError: 'utf-8' codec can't
decode byte 0xb1 in position 43402: invalid start byte""",
"""https://www.digitalocean.com/community/tutorials/python-remove-character-from-string
- used to remove commas when cleaning the dollar values in the dataset. Queried - how
to remove specific characters from a string.""",
"""https://plotly.com/python/line-and-scatter/ - Used to work with plotly for one of my
challenge goals. Queried - plotly scatter plots""",
"""https://plotly.com/python/subplots/ - Used to understand how to work with subplots in
plotly. Queried - plotly subplots""",
"""https://stackoverflow.com/questions/56727843/how-can-i-create-subplots-with-plotly-
express - Used to understand how to make subplots using figures made plotly express.
Queried - plotly subplots with plotly express"""
"""https://plotly.com/python/getting-started/ - Used to solve how to install the plotly
library. Queried - python install plotly""",
"""https://stackoverflow.com/questions/74325447/how-to-plot-multiple-scatterplots-
with-trendlines-as-subplots-using-plotly. Used to understand how to access the trendline
produced for a scatterplot with plotly express so that it can also be graphed on a
subplot. Queried - plotly trendlines on subplots with plotly express""",
"""https://plotly.com/python/figure-labels/ - Used to understand how to set the title
for plotly figures. Queried - plotly figure title"""
]
assert your_name != "", "your_name cannot be empty"
assert ... not in sources, "sources should not include the placeholder ellipsis"
assert len(sources) >= 6, "must include at least 6 sources, inclusive of lectures and sections"
Data Setting and MethodsΒΆ
Replace this text with a description of the data setting, any data transformations you conducted, and the methods you plan to use to answer the research questions. You may remove the code cell below if you don't need it.
The dataset for EPA walkability is the main dataset that this project will look at. The dataset contains datapoints at the census block group level. It contains a column of walkability index values for each census block group where a higher walkability index indicates that an area is more walkable. In order for the dataset to be used with the other two datasets, one on diabetes and obesity rates and the other on quality of life and economic conditions, the census block groups will have to be combined into data on the county-level. A method for combining census block groups into counties will be required along with a method to attach county names to the county fips codes present within the EPA walkability dataset.
Values within the dataset on quality of life and economic conditions presents data in the form of strings for percentages such as "1%" and dollar amounts such as "$1." A method will be required to convert the string representations of values into float values.
After the EPA walkability index values are merged with the other two datasets, methods will be required to plot the various values that this project will look at. There will be one plot for the percentage of days that the county has had a good AQI, another for the number of water quality violations per visit, and two more for the health of counties and the economic conditions of counties.
ResultsΒΆ
Setting up the data and librariesΒΆ
The first two steps are to import necessary libraries and to read in the datasets from the csv files:
pip install "notebook>=7.0" "anywidget>=0.9.13"
Requirement already satisfied: notebook>=7.0 in /opt/conda/lib/python3.11/site-packages (7.2.0) Requirement already satisfied: anywidget>=0.9.13 in /opt/conda/lib/python3.11/site-packages (0.9.18) Requirement already satisfied: jupyter-server<3,>=2.4.0 in /opt/conda/lib/python3.11/site-packages (from notebook>=7.0) (2.14.0) Requirement already satisfied: jupyterlab-server<3,>=2.27.1 in /opt/conda/lib/python3.11/site-packages (from notebook>=7.0) (2.27.1) Requirement already satisfied: jupyterlab<4.3,>=4.2.0 in /opt/conda/lib/python3.11/site-packages (from notebook>=7.0) (4.2.0) Requirement already satisfied: notebook-shim<0.3,>=0.2 in /opt/conda/lib/python3.11/site-packages (from notebook>=7.0) (0.2.4) Requirement already satisfied: tornado>=6.2.0 in /opt/conda/lib/python3.11/site-packages (from notebook>=7.0) (6.4) Requirement already satisfied: anyio>=3.1.0 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (4.3.0) Requirement already satisfied: argon2-cffi>=21.1 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (23.1.0) Requirement already satisfied: jinja2>=3.0.3 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (3.1.4) Requirement already satisfied: jupyter-client>=7.4.4 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (8.6.1) Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (5.7.2) Requirement already satisfied: jupyter-events>=0.9.0 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (0.10.0) Requirement already satisfied: jupyter-server-terminals>=0.4.4 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (0.5.3) Requirement already satisfied: nbconvert>=6.4.4 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (7.16.4) Requirement already satisfied: nbformat>=5.3.0 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (5.10.4) Requirement already satisfied: overrides>=5.0 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (7.7.0) Requirement already satisfied: packaging>=22.0 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (24.2) Requirement already satisfied: prometheus-client>=0.9 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (0.20.0) Requirement already satisfied: pyzmq>=24 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (26.0.3) Requirement already satisfied: send2trash>=1.8.2 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (1.8.3) Requirement already satisfied: terminado>=0.8.3 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (0.18.1) Requirement already satisfied: traitlets>=5.6.0 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (5.14.3) Requirement already satisfied: websocket-client>=1.7 in /opt/conda/lib/python3.11/site-packages (from jupyter-server<3,>=2.4.0->notebook>=7.0) (1.8.0) Requirement already satisfied: async-lru>=1.0.0 in /opt/conda/lib/python3.11/site-packages (from jupyterlab<4.3,>=4.2.0->notebook>=7.0) (2.0.4) Requirement already satisfied: httpx>=0.25.0 in /opt/conda/lib/python3.11/site-packages (from jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.27.0) Requirement already satisfied: ipykernel>=6.5.0 in /opt/conda/lib/python3.11/site-packages (from jupyterlab<4.3,>=4.2.0->notebook>=7.0) (6.29.3) Requirement already satisfied: jupyter-lsp>=2.0.0 in /opt/conda/lib/python3.11/site-packages (from jupyterlab<4.3,>=4.2.0->notebook>=7.0) (2.2.5) Requirement already satisfied: babel>=2.10 in /opt/conda/lib/python3.11/site-packages (from jupyterlab-server<3,>=2.27.1->notebook>=7.0) (2.14.0) Requirement already satisfied: json5>=0.9.0 in /opt/conda/lib/python3.11/site-packages (from jupyterlab-server<3,>=2.27.1->notebook>=7.0) (0.9.25) Requirement already satisfied: jsonschema>=4.18.0 in /opt/conda/lib/python3.11/site-packages (from jupyterlab-server<3,>=2.27.1->notebook>=7.0) (4.22.0) Requirement already satisfied: requests>=2.31 in /opt/conda/lib/python3.11/site-packages (from jupyterlab-server<3,>=2.27.1->notebook>=7.0) (2.31.0) Requirement already satisfied: ipywidgets>=7.6.0 in /opt/conda/lib/python3.11/site-packages (from anywidget>=0.9.13) (8.1.2) Requirement already satisfied: psygnal>=0.8.1 in /opt/conda/lib/python3.11/site-packages (from anywidget>=0.9.13) (0.13.0) Requirement already satisfied: typing-extensions>=4.2.0 in /opt/conda/lib/python3.11/site-packages (from anywidget>=0.9.13) (4.11.0) Requirement already satisfied: idna>=2.8 in /opt/conda/lib/python3.11/site-packages (from anyio>=3.1.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (3.7) Requirement already satisfied: sniffio>=1.1 in /opt/conda/lib/python3.11/site-packages (from anyio>=3.1.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.3.1) Requirement already satisfied: argon2-cffi-bindings in /opt/conda/lib/python3.11/site-packages (from argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->notebook>=7.0) (21.2.0) Requirement already satisfied: certifi in /opt/conda/lib/python3.11/site-packages (from httpx>=0.25.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (2025.1.31) Requirement already satisfied: httpcore==1.* in /opt/conda/lib/python3.11/site-packages (from httpx>=0.25.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (1.0.5) Requirement already satisfied: h11<0.15,>=0.13 in /opt/conda/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.25.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.14.0) Requirement already satisfied: comm>=0.1.1 in /opt/conda/lib/python3.11/site-packages (from ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.2.2) Requirement already satisfied: debugpy>=1.6.5 in /opt/conda/lib/python3.11/site-packages (from ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (1.8.1) Requirement already satisfied: ipython>=7.23.1 in /opt/conda/lib/python3.11/site-packages (from ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (8.24.0) Requirement already satisfied: matplotlib-inline>=0.1 in /opt/conda/lib/python3.11/site-packages (from ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.1.7) Requirement already satisfied: nest-asyncio in /opt/conda/lib/python3.11/site-packages (from ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (1.6.0) Requirement already satisfied: psutil in /opt/conda/lib/python3.11/site-packages (from ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (5.9.8) Requirement already satisfied: decorator in /opt/conda/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (5.1.1) Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.19.1) Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /opt/conda/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (3.0.42) Requirement already satisfied: pygments>=2.4.0 in /opt/conda/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (2.18.0) Requirement already satisfied: stack-data in /opt/conda/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.6.2) Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (4.9.0) Requirement already satisfied: wcwidth in /opt/conda/lib/python3.11/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.2.13) Requirement already satisfied: widgetsnbextension~=4.0.10 in /opt/conda/lib/python3.11/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.13) (4.0.10) Requirement already satisfied: jupyterlab-widgets~=3.0.10 in /opt/conda/lib/python3.11/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.13) (3.0.10) Requirement already satisfied: parso<0.9.0,>=0.8.3 in /opt/conda/lib/python3.11/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.8.4) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from jinja2>=3.0.3->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.1.5) Requirement already satisfied: attrs>=22.2.0 in /opt/conda/lib/python3.11/site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->notebook>=7.0) (23.2.0) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/conda/lib/python3.11/site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->notebook>=7.0) (2023.12.1) Requirement already satisfied: referencing>=0.28.4 in /opt/conda/lib/python3.11/site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->notebook>=7.0) (0.35.1) Requirement already satisfied: rpds-py>=0.7.1 in /opt/conda/lib/python3.11/site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->notebook>=7.0) (0.18.1) Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from jupyter-client>=7.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.9.0) Requirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.11/site-packages (from jupyter-core!=5.0.*,>=4.12->jupyter-server<3,>=2.4.0->notebook>=7.0) (4.2.2) Requirement already satisfied: python-json-logger>=2.0.4 in /opt/conda/lib/python3.11/site-packages (from jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.0.7) Requirement already satisfied: pyyaml>=5.3 in /opt/conda/lib/python3.11/site-packages (from jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (6.0.1) Requirement already satisfied: rfc3339-validator in /opt/conda/lib/python3.11/site-packages (from jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (0.1.4) Requirement already satisfied: rfc3986-validator>=0.1.1 in /opt/conda/lib/python3.11/site-packages (from jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (0.1.1) Requirement already satisfied: fqdn in /opt/conda/lib/python3.11/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.5.1) Requirement already satisfied: isoduration in /opt/conda/lib/python3.11/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (20.11.0) Requirement already satisfied: jsonpointer>1.13 in /opt/conda/lib/python3.11/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.4) Requirement already satisfied: uri-template in /opt/conda/lib/python3.11/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.3.0) Requirement already satisfied: webcolors>=1.11 in /opt/conda/lib/python3.11/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.13) Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (4.12.3) Requirement already satisfied: bleach!=5.0.0 in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (6.1.0) Requirement already satisfied: defusedxml in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (0.7.1) Requirement already satisfied: jupyterlab-pygments in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (0.3.0) Requirement already satisfied: mistune<4,>=2.0.3 in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (3.0.2) Requirement already satisfied: nbclient>=0.5.0 in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (0.10.0) Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.5.0) Requirement already satisfied: tinycss2 in /opt/conda/lib/python3.11/site-packages (from nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.3.0) Requirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.11/site-packages (from bleach!=5.0.0->nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.16.0) Requirement already satisfied: webencodings in /opt/conda/lib/python3.11/site-packages (from bleach!=5.0.0->nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (0.5.1) Requirement already satisfied: fastjsonschema>=2.15 in /opt/conda/lib/python3.11/site-packages (from nbformat>=5.3.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.19.1) Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.11/site-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.7.0) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests>=2.31->jupyterlab-server<3,>=2.27.1->notebook>=7.0) (3.3.2) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests>=2.31->jupyterlab-server<3,>=2.27.1->notebook>=7.0) (1.26.20) Requirement already satisfied: cffi>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.16.0) Requirement already satisfied: pycparser in /opt/conda/lib/python3.11/site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.22) Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.11/site-packages (from beautifulsoup4->nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.5) Requirement already satisfied: arrow>=0.15.0 in /opt/conda/lib/python3.11/site-packages (from isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (1.3.0) Requirement already satisfied: types-python-dateutil>=2.8.10 in /opt/conda/lib/python3.11/site-packages (from arrow>=0.15.0->isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=7.0) (2.9.0.20240316) Requirement already satisfied: executing>=1.2.0 in /opt/conda/lib/python3.11/site-packages (from stack-data->ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (2.0.1) Requirement already satisfied: asttokens>=2.1.0 in /opt/conda/lib/python3.11/site-packages (from stack-data->ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (2.4.1) Requirement already satisfied: pure-eval in /opt/conda/lib/python3.11/site-packages (from stack-data->ipython>=7.23.1->ipykernel>=6.5.0->jupyterlab<4.3,>=4.2.0->notebook>=7.0) (0.2.2) Note: you may need to restart the kernel to use updated packages.
pip install plotly
Requirement already satisfied: plotly in /opt/conda/lib/python3.11/site-packages (6.1.2) Requirement already satisfied: narwhals>=1.15.1 in /opt/conda/lib/python3.11/site-packages (from plotly) (1.41.1) Requirement already satisfied: packaging in /opt/conda/lib/python3.11/site-packages (from plotly) (24.2) Note: you may need to restart the kernel to use updated packages.
pip install plotly[express]
Requirement already satisfied: plotly[express] in /opt/conda/lib/python3.11/site-packages (6.1.2) Requirement already satisfied: narwhals>=1.15.1 in /opt/conda/lib/python3.11/site-packages (from plotly[express]) (1.41.1) Requirement already satisfied: packaging in /opt/conda/lib/python3.11/site-packages (from plotly[express]) (24.2) Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages (from plotly[express]) (1.26.4) Note: you may need to restart the kernel to use updated packages.
import doctest
import pandas as pd
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
walking_data = pd.read_csv("epa_data.csv")
diabetes_data = pd.read_csv("diabetes.csv")
qol_data = pd.read_csv("qol_data.csv")
fips_by_state = pd.read_csv("fips-by-state.csv", encoding='unicode_escape')
Then, the necessary columns from each dataset is selected and the main dataset is setup to contain a column of 5 digit fips codes, where the first two digits are the fips codes for the state the census group block resides in and the last 3 digits are the fips codes for the county that the census group block resides in:
walking = walking_data[["NatWalkInd", "COUNTYFP", "STATEFP"]]
diabetes = diabetes_data[["FIPS.Codes", "percent.men.diabetes", "percent.women.diabetes",
"percent.men.obese", "percent.women.obese"]]
qol = qol_data[["FIPS", "Unemployment", "AQI%Good", "WaterQualityVPV", "Cost of Living",
"2022 Median Income"]]
fips = fips_by_state[["fips"]]
# Combines the state fips, a 2 digit code, and the county fips code, a three digit code, into
# a single five digit fips code.
walking.loc[:, "fips"] = (walking_data.loc[:, "STATEFP"].astype(str) +
walking_data.loc[:, "COUNTYFP"].astype(str)).astype(int)
# The individual census block groups in the walkability dataset are merged
# based on the county they are a part of.
walking = walking.groupby("fips", as_index=False)[["NatWalkInd"]].mean()
# Merges the walkability dataset with the county dataset based on the five digit fips code
walking = walking.merge(fips, on="fips")
walking
/tmp/ipykernel_3843/793830053.py:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy walking.loc[:, "fips"] = (walking_data.loc[:, "STATEFP"].astype(str) +
| fips | NatWalkInd | |
|---|---|---|
| 0 | 1101 | 11.706030 |
| 1 | 1103 | 6.202222 |
| 2 | 1105 | 4.083333 |
| 3 | 1107 | 4.543860 |
| 4 | 1109 | 5.362319 |
| ... | ... | ... |
| 1246 | 55133 | 8.675585 |
| 1247 | 55135 | 5.979675 |
| 1248 | 55137 | 4.952381 |
| 1249 | 55139 | 9.762087 |
| 1250 | 55141 | 7.039352 |
1251 rows Γ 2 columns
The main dataset is then merged with the other two datasets based on the fips codes.
walking = walking.merge(diabetes, left_on="fips", right_on="FIPS.Codes", how="left")
walking
| fips | NatWalkInd | FIPS.Codes | percent.men.diabetes | percent.women.diabetes | percent.men.obese | percent.women.obese | |
|---|---|---|---|---|---|---|---|
| 0 | 1101 | 11.706030 | 1101 | 13.4 | 15.1 | 30.3 | 36.4 |
| 1 | 1103 | 6.202222 | 1103 | 12.1 | 11.7 | 32.9 | 31.5 |
| 2 | 1105 | 4.083333 | 1105 | 16.8 | 19.3 | 32.6 | 42.0 |
| 3 | 1107 | 4.543860 | 1107 | 15.5 | 15.8 | 35.4 | 39.2 |
| 4 | 1109 | 5.362319 | 1109 | 13.6 | 15.5 | 32.4 | 35.5 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1246 | 55133 | 8.675585 | 55133 | 8.1 | 6.7 | 29.5 | 24.6 |
| 1247 | 55135 | 5.979675 | 55135 | 10.2 | 8.5 | 27.9 | 23.6 |
| 1248 | 55137 | 4.952381 | 55137 | 9.8 | 8.6 | 33.0 | 30.9 |
| 1249 | 55139 | 9.762087 | 55139 | 9.7 | 8.7 | 29.5 | 26.8 |
| 1250 | 55141 | 7.039352 | 55141 | 10.4 | 9.1 | 35.4 | 33.1 |
1251 rows Γ 7 columns
The quality of life dataset needs to have its values changed into floating values that can be plotted:
def clean_number_values(value):
"""
Takes in a string that is representative of a number value such as a dollar amount with the
dollar sign attached at the front or a percentage with the percent symbol. Returns a float
that is representative of the string value. For example, 1 for 1% or 4000.53 for $4000.53.
>>> clean_number_values("87.86%")
87.86
>>> clean_number_values("$76,653.78")
76653.78
>>> clean_number_values("$400")
400.0
"""
if value[0] == "$":
return float(value[1:].replace(',', ''))
elif value[-1] == "%":
return float(value[:-1])
doctest.run_docstring_examples(clean_number_values, globals())
qol.loc[:, "Unemployment"] = qol["Unemployment"].apply(clean_number_values)
qol.loc[:, "AQI%Good"] = qol["AQI%Good"].apply(clean_number_values)
qol.loc[:, "Cost of Living"] = qol["Cost of Living"].apply(clean_number_values)
qol.loc[:, "2022 Median Income"] = qol["2022 Median Income"].apply(clean_number_values)
walking = walking.merge(qol, left_on="fips", right_on="FIPS", how="left")
walking
| fips | NatWalkInd | FIPS.Codes | percent.men.diabetes | percent.women.diabetes | percent.men.obese | percent.women.obese | FIPS | Unemployment | AQI%Good | WaterQualityVPV | Cost of Living | 2022 Median Income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1101 | 11.706030 | 1101 | 13.4 | 15.1 | 30.3 | 36.4 | 1101.0 | 3.17 | 80.94 | 1.0 | 74899.78 | 64886.16 |
| 1 | 1103 | 6.202222 | 1103 | 12.1 | 11.7 | 32.9 | 31.5 | 1103.0 | 2.14 | 80.94 | 0.0 | 69001.51 | 65752.8 |
| 2 | 1105 | 4.083333 | 1105 | 16.8 | 19.3 | 32.6 | 42.0 | 1105.0 | 5.55 | 80.94 | 1.0 | 65313.37 | 36420.49 |
| 3 | 1107 | 4.543860 | 1107 | 15.5 | 15.8 | 35.4 | 39.2 | 1107.0 | 3.36 | 80.94 | 2.0 | 68146.02 | 55549.23 |
| 4 | 1109 | 5.362319 | 1109 | 13.6 | 15.5 | 32.4 | 35.5 | 1109.0 | 2.81 | 80.94 | 1.0 | 67633.43 | 58535.25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1246 | 55133 | 8.675585 | 55133 | 8.1 | 6.7 | 29.5 | 24.6 | 55133.0 | 2.56 | 87.89 | 0.0 | 83046.59 | 111023.52 |
| 1247 | 55135 | 5.979675 | 55135 | 10.2 | 8.5 | 27.9 | 23.6 | 55135.0 | 3.07 | 87.89 | 0.0 | 68886.81 | 74928.52 |
| 1248 | 55137 | 4.952381 | 55137 | 9.8 | 8.6 | 33.0 | 30.9 | 55137.0 | 3.53 | 87.89 | 0.0 | 70582.8 | 66457.21 |
| 1249 | 55139 | 9.762087 | 55139 | 9.7 | 8.7 | 29.5 | 26.8 | 55139.0 | 2.61 | 87.89 | 0.0 | 70103.93 | 79314.09 |
| 1250 | 55141 | 7.039352 | 55141 | 10.4 | 9.1 | 35.4 | 33.1 | 55141.0 | 3.45 | 87.89 | 0.0 | 69523.85 | 71662.19 |
1251 rows Γ 13 columns
PlotsΒΆ
Each of the research questions will be answered by creating plots to see relationships between walkability and quality of life, health, and economic conditions.
Research Question 1: Quality of LifeΒΆ
def plot_air_quality(walking_dataset):
"""
Takes in a pandas dataframe and creates a plotly plot that compares each county in the
dataset's EPA walkability index value with the county's percentage of days with a good
AQI.
"""
fig = px.scatter(walking_dataset, x="NatWalkInd", y="AQI%Good",
labels= {
"NatWalkInd" : "EPA Walkability Index",
"AQI%Good" : "Percent of Days with Good Air Quality"
})
fig.update_layout(
title={
"text" : "County Air Quality and County Walkability",
"x" : 0.5,
"xanchor" : "center",
"yanchor" : "top"
})
fig.show(renderer="notebook")
plot_air_quality(walking)
Based on the plot, no clear relationship can be drawn between a county's walkability and its air quality. I expected that there would be better air quality in cities with higher walkability as there would be less of a need to drive. One possible reason for this unexpected result is that there are many other stronger factors that impact air quality.
def plot_water_quality_violations(walking_dataset):
"""
Takes in a pandas dataframe and creates a plotly plot comparing each county's EPA walkability
index value to the county's number of water quality violations per visit.
"""
# Excludes counties with a WaterQualityVPV that are equal to -1 as the dataset uses
# -1 for counties without enough data.
walking_data = walking_dataset[(walking_dataset["WaterQualityVPV"] != -1)]
fig = px.scatter(walking_data, x="NatWalkInd", y="WaterQualityVPV",
labels={
"NatWalkInd" : "EPA Walkability Index",
"WaterQualityVPV" : "Water Quality Violations Per Visit"
})
fig.update_layout(
title={
"text" : "County Water Quality Violations Per Visitation and Walkability",
"x" : 0.5,
"xanchor" : "center",
"yanchor" : "top"
})
fig.show(renderer="notebook")
plot_water_quality_violations(walking)
A small relationship can be seen in the data where it seems that counties with higher walkability index values tend to have lower water quality violations.
Research Question 2: HealthΒΆ
def plot_diabetes(walking_dataset):
"""
Takes in a pandas dataframe on EPA walkability index values for counties and creates two plots
on EPA Walkability index values for each county compared to the percentage of men with diabetes
in the county for the first plot and the percentage of women with diabetes in the county for
the second plot.
"""
fig = make_subplots(1, 2, shared_yaxes=True, shared_xaxes=True,
subplot_titles=("Men", "Women"))
fig.update_layout(
title={
"text" : "Percentage of Diabetes and EPA Walkability Index",
"x" : 0.5,
"xanchor" : "center",
"yanchor" : "top"
},
yaxis={
"title" : "Percentage of Population with Diabetes"
})
fig.update_xaxes(title_text="EPA Walkability Index", row=1, col=1)
fig.update_xaxes(title_text="EPA Walkability Index", row=1, col=2)
scatter1 = px.scatter(walking_dataset, x="NatWalkInd", y="percent.men.diabetes",
trendline='ols')
scatter2 = px.scatter(walking_dataset, x="NatWalkInd", y="percent.women.diabetes",
trendline='ols')
fig.add_trace(scatter1["data"][0], row=1, col=1)
fig.add_trace(scatter1["data"][1], row=1, col=1)
fig.add_trace(scatter2["data"][0], row=1, col=2)
fig.add_trace(scatter2["data"][1], row=1, col=2)
fig.show(renderer="notebook")
plot_diabetes(walking)
There is a clear relationship between the walkability of a county and the rates of diabetes among both men and women.
def plot_obesity(walking_dataset):
"""
Takes in a pandas dataframe on EPA Walkability index values for counties and creates
two plots on EPA walkability index values compared to the percentage of men who are
considered obese in the county for the first plot and to the percentage of women who are
considered obese in the county for the second plot.
"""
fig = make_subplots(1, 2, shared_yaxes=True)
fig.update_layout(
title={
"text" : "Percentage of Obesity and EPA Walkability Index",
"x" : 0.5,
"xanchor" : "center",
"yanchor" : "top"
},
yaxis={
"title" : "Percentage of Population Considered Obese"
})
fig.update_xaxes(title_text="EPA Walkability Index", row=1, col=1)
fig.update_xaxes(title_text="EPA Walkability Index", row=1, col=2)
scatter1 = px.scatter(walking_dataset, x="NatWalkInd", y="percent.men.obese",
trendline='ols')
scatter2 = px.scatter(walking_dataset, x="NatWalkInd", y="percent.women.obese",
trendline='ols')
fig.add_trace(scatter1["data"][0], row=1, col=1)
fig.add_trace(scatter1["data"][1], row=1, col=1)
fig.add_trace(scatter2["data"][0], row=1, col=2)
fig.add_trace(scatter2["data"][1], row=1, col=2)
fig.show(renderer="notebook")
plot_obesity(walking)
Once again, there is another clear relationship between obesity rates and the walkability of a county. This likely means that a county being more walkable improves the health of residents as people would be more likely to walk to their destinations rather than drive, resulting in better physical health.
Research Question 3: EconomicΒΆ
def plot_economic(walking_dataset):
"""
Takes in a pandas dataframe of EPA walkabililty index values for counties and produces three
plots with EPA walkability index values compared to unemployment rates for the first plot,
2022 median incomes for the second plot, and cost of living in the county for the third plot.
"""
fig = make_subplots(3, 1)
fig.update_layout(
title={
"text" : "Economic Conditions and EPA Walkability Index",
"x" : 0.5,
"xanchor" : "center",
"yanchor" : "top"
},
height=1000)
fig.update_xaxes(title_text="EPA Walkability Index", row=1, col=1)
fig.update_xaxes(title_text="EPA Walkability Index", row=2, col=1)
fig.update_xaxes(title_text="EPA Walkability Index", row=3, col=1)
fig.update_yaxes(title_text="Unemployment Percentage", row=1, col=1)
fig.update_yaxes(title_text="County Median Income in 2022 in Dollars", row=2, col=1)
fig.update_yaxes(title_text="Cost of Living in Dollars", row=3, col=1)
scatter1 = px.scatter(walking_dataset, x="NatWalkInd", y="Unemployment",
trendline='ols')
scatter2 = px.scatter(walking_dataset, x="NatWalkInd", y="2022 Median Income",
trendline='ols')
scatter3 = px.scatter(walking_dataset, x="NatWalkInd", y="Cost of Living",
trendline='ols')
fig.add_trace(scatter1["data"][0], row=1, col=1)
fig.add_trace(scatter1["data"][1], row=1, col=1)
fig.add_trace(scatter2["data"][0], row=2, col=1)
fig.add_trace(scatter2["data"][1], row=2, col=1)
fig.add_trace(scatter3["data"][0], row=3, col=1)
fig.add_trace(scatter3["data"][1], row=3, col=1)
fig.show(renderer="notebook")
plot_economic(walking)
There are relationships between walkability and all three factors considered. Counties with higher walkability index values tend to hav lower unemployment, higher median income, and interestingly higher costs of living. This raises the question of whether the differences in walkability seen in this project are due to walkability or instead wealthier counties having better infrastructure and having greater walkability index scores, and better physical health also being a result of the greater wealth.
Implications and LimitationsΒΆ
Replace this text with your analysis. Who might benefit from your analysis and who might be excluded or otherwise harmed by it? What about the data setting might have impacted your results? Explain at least 3 limitations of your analysis and how others should or shouldn't be advised to use your conclusions. You may remove the code cell below if you don't need it.
This analysis would benefit areas of the US that suffer from not being very walkable. The data makes it clear that there are correlations between physical health and walkability and as such creating cities that are walkable should be a goal. However, a limitation of my analysis is that it only considers walkability and so should not be solely used to determine which areas of the US need more aid as there are many factors that should be considered. This analysis would likely exclude areas that are suffering from problems other than walkability or where walkability is not as pressing. Another limitation is that the data had to combine walkability data on the census block group level and had to merge it into walkability data on the county level meaning the walkability of counties may not be completely accurately. A third limitation is that it only considers the relationships between walkability and different factors. Although it seems reasonable to assume that walkability can cause the differences seen in the plots when considering health and maybe when considering economic condition, there could be different reasons for the relationships seen in the graphs such as wealthier communities being able to create more walkable areas leading to walkable counties having higher median income.